The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
On the basis of two-speaker spontaneous conversations, it is shown that the distributions of both pauses and speech-overlaps of telephone and face-to-face dialogues have different statistical properties. Pauses in a face-to-face dialogue last up to 4 times longer than pauses in telephone conversations in functionally comparable conditions. There is a high correlation (0.88 or larger) between the average...
This paper describes the SpexKit framework for the development of spoken dialogue systems, which is currently used to implement prototypes of a bilingual city information system. We sketch the overall architecture of this speech platform, its dialogue manager and its scripting language, as well as the integration of speech technology components like ASR or TTS systems.
This paper is about the automated production of dialogue models. The goal is to propose and validate a methodology that allows the production of finalized dialogue models (i.e. dialogue models specific for given applications) in a few hours. The solution we propose for such a methodology, called the Rapid Dialogue Prototyping Methodology (RDPM), is decomposed into five consecutive main steps, namely:...
Using voice to access on-line information from the web would be really useful, because of the proliferation of mobile devices which allow Internet access anytime and anywhere. However, vocal interface is sequential and not persistent, and thus, we have to restructure the information in order to achieve an efficient and natural way of interaction. Our proposal is based on converting original web contents...
The central module of any natural language dialogue system is the dialogue manager, which plays the role of an intermediate agent between the user and the information source. Its cooperativity and portability highly determines the efficiency of the dialogue system. Therefore, as the basis for cooperativity of information-providing dialogue systems we propose a knowledge representation of the information...
This paper is focused on improving visual Czech speech synthesis. Our aim was the design of a highly natural and realistic talking head with a realistic 3D face model, improved co-articulation, and a realistic model of inner articulatory organs (teeth, the tongue and the palate). Besides very good articulation our aim was also expression of the mimic and emotions of the talking head. The intelligibility...
In this study the usability of two versions of a web based electronic list of literature and information system for blind and visually disabled people were evaluated. Because of the access possibilities of the focus group the applicability for a speech controlled Interface (screen reader, speech controlled web browser) were one point of interest. Furthermore there was focus on the integration of different...
Unreliable speech recognition, especially in noisy environments and the need for more natural interaction between man and machine have motivated the development of multimodal systems using speech, pointing, gaze, and facial expressions. In this paper we present a new approach to fuse multimodal information streams using agents. A general framework based on this approach that allows for rapid application...
Two sets of linguistic features are developed: The first one to estimate if a single step in a dialogue between a human being and a machine is successful or not. The second set to classify dialogues as a whole. The features are based on Part-of-Speech-Labels (POS), word statistics and properties of turns and dialogues. Experiments were carried out on the SympaFly corpus, data from a real application...
We present a logical approach of spoken language understanding for a human-machine dialogue system. The aim of the analysis is to provide a logical formula, or a conceptual graph, by assembling concepts related to a delimited application domain. This flexible structure is gradually built during an incremental parsing, which is meant to combine syntactic and semantic criteria. Then, a contextual understanding...
Discourse in formal domains, such as mathematics, is characterized by a mixture of natural language and embedded formal expressions. Based on an investigation of a collected corpus of informal dialogues on naive set theory proofs, we are developing a dependency-based lexicalist grammar for parsing input with different degrees of verbalization of the mathematical content: ranging from symbolic alone...
Where have we been and where are we going? Three types of answers will be discussed: consistent progress, oscillations and discontinuities. Moore’s Law provides a convincing demonstration of consistent progress, when it applies. Speech recognition error rates are declining by 10× per decade; speech coding rates are declining by 2× per decade. Unfortunately, fields do not always move in consistent...
We present a new approach to determining the meaning of words in text, which relies on assigning senses to the contexts within which words occur, rather than to the words themselves. A preliminary version of this approach is presented in Pustejovsky, Hanks and Rumshisky (2004, COLING). We argue that words senses are not directly encoded in the lexicon of a language, but rather that each word is associated...
I will first sketch some background on the company ScanSoft. Next, I will discuss ScanSoft’s products and technologies, which include digital imaging and OCR technology, automatic speech recognition technology (ASR), text-to-speech technology (TTS), dialogue technology, including multimodal dialogues, dictation technology and audiomining technology. I will sketch the basic functionality of these technologies,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.